Take-Home Exercise 2: Spatio-temporal analysis of COVID-19 Vaccination Trends at the Sub-district Level, DKI Jarkarta

Published

February 13, 2023

Modified

February 24, 2023

1 Context

Since the outbreak of COVID-19, many countries have rushed to create a vaccine and ensure its population is well immunized against this novel coronavirus. On 13 January 2021, the mass immunisation program commenced and since then Indonesia ranks third in Asia and fifth in the world in terms of total doses given.

In this take-home assignment, we will be exploring vaccination rates in DKI Jarkarta, identifying sub-districts with relatively higher number of vaccination rate and how they changed over time.

The tasks given to us is as follows:

Choropleth Mapping and Analysis

  • Compute the monthly vaccination rate from July 2021 to June 2022 at sub-district (also known as kelurahan in Bahasa Indonesia) level,

  • Prepare the monthly vaccination rate maps by using appropriate tmap functions,

  • Describe the spatial patterns revealed by the choropleth maps (not more than 200 words).

Local Gi* Analysis

With reference to the vaccination rate maps prepared in ESDA:

  • Compute local Gi* values of the monthly vaccination rate,

  • Display the Gi* maps of the monthly vaccination rate. The maps should only display the significant (i.e. p-value < 0.05)

  • With reference to the analysis results, draw statistical conclusions (not more than 250 words).

Emerging Hot Spot Analysis(EHSA)

With reference to the local Gi* values of the vaccination rate maps prepared in the previous section:

  • Perform Mann-Kendall Test by using the spatio-temporal local Gi* values,

  • Select three sub-districts and describe the temporal trends revealed (not more than 250 words), and

  • Prepared a EHSA map of the Gi* values of vaccination rate. The maps should only display the significant (i.e. p-value < 0.05).

  • With reference to the EHSA map prepared, describe the spatial patterns revelaed. (not more than 250 words).

Throughout this page, each step will be explained and guided so that you can follow along.

2 The Set Up

2.1 Packages Used

The R packages used for this analysis are:

  • sf

  • tidyverse

  • spatstat

  • tmap

  • sfdep

  • maptools

  • readxl

The code chunk below checks whether the packages have been installed, if not it will automatically install them and load the packages into Rstudio.

pacman::p_load(sf, raster, spatstat, tmap, tidyverse, sfdep, maptools, readxl, plotly)

2.2 The Data

Type Name Format Description
Geospatial Batas Desa Provinsi DKI Jakarta shapefile Sub-districts in DKI Jakarta
Aspatial Riwayat File Vaksinasi DKI Jakarta .xlsx Sub-district level data of vaccination numbers between July 2021 to June 2022
  • Geospatial Data

The link under Geospatial Data above brings you to a page where there are many download links sorted by province. Ensure that you are using Shapefile (SHP) Batas Desa Provinsi DKI Jakarta.

  • Aspatial Data

The link under Aspatial Data above brings you to a page where there are two types of data files you can use. Please choose Data Vaksinasi Berbasis Kelurahan dan Kecamatan and download a total of 12 files beginning July 2021 to June 2022.

Do note that we will be using the beginning of each month for our download.

3 Data Wrangling

3.1 Geospatial Data

3.1.1 Import Geospatial Data

geoDKI <- st_read(dsn = "data/geospatial",
                    layer = "BATAS_DESA_DESEMBER_2019_DUKCAPIL_DKI_JAKARTA")
Reading layer `BATAS_DESA_DESEMBER_2019_DUKCAPIL_DKI_JAKARTA' from data source 
  `/Users/junhaoteo/Documents/junhao2309/IS415/Take-Home_Ex/Take-Home_Ex02/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 269 features and 161 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 106.3831 ymin: -6.370815 xmax: 106.9728 ymax: -5.184322
Geodetic CRS:  WGS 84

From the output above, we can see that the data set has a geometry type, Multipolygon, and has 269 features and 161 fields.

3.1.2 Check for invalid geometries

Before we begin, we should check whether there are any invalid geometries by using the code chunk below:

st_is_valid(geoDKI)
  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[106] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[136] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[151] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[166] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[181] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[196] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[211] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[226] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[241] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[256] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

st_is_valid() from the sf package helps to check whether a geometry is valid. From the output, there are no invalid geometries.

3.1.3 Check for Missing values

The code chunk below uses is.na() from base R checks whether the data set has NA values. which() from base R takes the indices of these values and lastly length() helps us calculate the length of the data objects.

length(which(is.na(geoDKI) == TRUE))
[1] 14

In the above output, there are 14 NA values within the jakarta data set.

We will just hold onto these 14 NA values and move on first.

3.1.4 Check Coordinate System

As different countries use different projection systems, we need to first check the CRS of jakarta.

The code chunk below uses st_crs() from the sf package:

st_crs(geoDKI)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

We notice that jakarta is using EPSG::4326 which is the wrong projection coordinate system. DKI Jakarta uses the DGN95, the ‘Datum Geodesi Nasional 1995’, EPSG::23878

We can transform the crs by using st_transform() from the sf package:

geoDKI <- geoDKI %>%
  st_transform(crs = 23878)
st_crs(geoDKI)
Coordinate Reference System:
  User input: EPSG:23878 
  wkt:
PROJCRS["DGN95 / UTM zone 48S",
    BASEGEOGCRS["DGN95",
        DATUM["Datum Geodesi Nasional 1995",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4755]],
    CONVERSION["UTM zone 48S",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",105,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",10000000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["Indonesia - south of equator and between 102°E and 108°E - onshore and offshore."],
        BBOX[-10.73,102,0,108.01]],
    ID["EPSG",23878]]

From the output above, we can see that the CRS has been properly assigned.

3.1.5 Removing Outer Islands

Let us visualise the geographical polygons.

qtm(geoDKI)

We can see from the output that jakarta includes both the mainland and the outer islands. Our study area focuses only on the mainland and thus we need to remove them.

View(geoDKI)

Before we continue further, we need to understand how DKI Jakarta geographical regions are segmented. The code chunk above lets you view the entire dataset. Let us understand the key variables below:

With the help of uncle google, we will translate the names.

Name Translation
KODE_DESA Village Code (Sub-District Codes)
DESA Village
PROVINSI Province
KAB_KOTA City
KECAMATAN Sub-District

KAB_KOTA would be the most logical choice in isolating out the outer islands. This will be shown in the plots later.

The code chunk below helps to output unique values in KAB_KOTA field.

unique(geoDKI$KAB_KOTA)
[1] "JAKARTA BARAT"    "JAKARTA PUSAT"    "KEPULAUAN SERIBU" "JAKARTA UTARA"   
[5] "JAKARTA TIMUR"    "JAKARTA SELATAN"  NA                

The output above shows that there are 6 major cities in DKI Jakarta, ignoring the NA value.

The codechunk below will visualize the data with respect to the 6 major cities. We can then see which city isolates out the outer islands.

tmap_mode("view")
tm_shape(geoDKI) +
  tm_polygons("KAB_KOTA")

3.1.5.1 Dealing with NA values

As seen in the above study area plot, the NA values in geoDKI pertains to the information of small areas that is shaded white.

By viewing geoDKI, row 243 and 244 is where all the NA resides at. From checking the area bounds online, the white sub-districts pertains to city region, JAKARTA UTARA. As we could not find the KODE DESA (village code), we will just assign unique values: 3188888801 and 3188888802 for now.

This ensures completeness in our geographical plots later on.

geoDKI$KAB_KOTA[243]<-"JAKARTA UTARA"
geoDKI$KAB_KOTA[244]<-"JAKARTA UTARA"

geoDKI$KODE_DESA[243]<-"3188888801"
geoDKI$KODE_DESA[244]<-"3188888802"

3.1.5.2 Filtering out Outer Islands

From the visualization above, we can see that KEPULAUAN SERIBU is not part of the mainland. We can then use filter() from dplyr package to remove the outer islands.

geoDKI <- filter(geoDKI, KAB_KOTA !="KEPULAUAN SERIBU")
tmap_mode("plot")
tm_shape(geoDKI) + 
  tm_polygons("KAB_KOTA")

Now, before we move on further, geoDKI has alot of columns. Since we are only interested in the subdistrict level, we will select KODE_DESA and rename it to village_code.

Note: village_code is also known as the subdistrict code.

geoDKI <- geoDKI %>%
  select(2, "geometry") %>%
  rename(village_code = `KODE_DESA`)

3.2 Aspatial Data

3.2.1 Import Aspatial Data

Before we import the data in, the file names are rather long. First, go to your data folder and change the aspatial data file names to Y-M format. It would look like this:

There are a total of 12 excel files we need to load into Rstudio. To read the files more efficiently, we will use the “for loop” function to read all the excel files into a data frame by using the read_excel() from the readxl package.

# Set the working directory to the folder containing the Excel files
setwd("data/aspatial/")
# Get a list of all Excel files in the directory
aspatial_data <- list.files(pattern = ".xlsx")

# Loop through the files and read each one into a data frame
for (i in aspatial_data) {
  assign(gsub(".xlsx", "", i), read_excel(i))
}

3.2.2 Columns of interest and its translation

In the aspatial data set, what we want is the total vaccination and not vaccinated numbers, along with these 4 regional classification columns.

names(`2022-6`)
 [1] "KODE KELURAHAN"                            
 [2] "WILAYAH KOTA"                              
 [3] "KECAMATAN"                                 
 [4] "KELURAHAN"                                 
 [5] "SASARAN"                                   
 [6] "BELUM VAKSIN"                              
 [7] "JUMLAH\r\nDOSIS 1"                         
 [8] "JUMLAH\r\nDOSIS 2"                         
 [9] "JUMLAH\r\nDOSIS 3"                         
[10] "TOTAL VAKSIN\r\nDIBERIKAN"                 
[11] "LANSIA\r\nDOSIS 1"                         
[12] "LANSIA\r\nDOSIS 2"                         
[13] "LANSIA\r\nDOSIS 3"                         
[14] "LANSIA TOTAL \r\nVAKSIN DIBERIKAN"         
[15] "PELAYAN PUBLIK\r\nDOSIS 1"                 
[16] "PELAYAN PUBLIK\r\nDOSIS 2"                 
[17] "PELAYAN PUBLIK\r\nDOSIS 3"                 
[18] "PELAYAN PUBLIK TOTAL\r\nVAKSIN DIBERIKAN"  
[19] "GOTONG ROYONG\r\nDOSIS 1"                  
[20] "GOTONG ROYONG\r\nDOSIS 2"                  
[21] "GOTONG ROYONG\r\nDOSIS 3"                  
[22] "GOTONG ROYONG TOTAL\r\nVAKSIN DIBERIKAN"   
[23] "TENAGA KESEHATAN\r\nDOSIS 1"               
[24] "TENAGA KESEHATAN\r\nDOSIS 2"               
[25] "TENAGA KESEHATAN\r\nDOSIS 3"               
[26] "TENAGA KESEHATAN TOTAL\r\nVAKSIN DIBERIKAN"
[27] "TAHAPAN 3\r\nDOSIS 1"                      
[28] "TAHAPAN 3\r\nDOSIS 2"                      
[29] "TAHAPAN 3\r\nDOSIS 3"                      
[30] "TAHAPAN 3 TOTAL\r\nVAKSIN DIBERIKAN"       
[31] "REMAJA\r\nDOSIS 1"                         
[32] "REMAJA\r\nDOSIS 2"                         
[33] "REMAJA\r\nDOSIS 3"                         
[34] "REMAJA TOTAL\r\nVAKSIN DIBERIKAN"          
Name Translation
KODE KELURAHAN Village Code
WILAYAH KOTA City Region
SASARAN Vaccinated
BELUM VAKSIN Not Vaccinated

3.2.3 Mutate Aspatial Data

3.2.3.1 Mutating using for loop

As there are 12 data sets, we will use the for loop that mutate, rename and select the fields that we want. The code chunk below does the following:

  1. Renames KODE KELURAHAN, WILAYAH KOTA, SASARAN and BELUM VAKSIN to village_code, city_region, vaccinated and not_vaccinated respectively by using rename() from the dplyr package
  2. Selects the renamed columns by using select() from the dplyr package
  3. Adds a date column by using mutate() from the dplyr package

All the mutated aspatial data frames are then placed into a list

list_mth<- list(`2021-7`,`2021-8`,`2021-9`,`2021-10`,`2021-11`,`2021-12`,`2022-1`,`2022-2`,`2022-3`,`2022-4`,`2022-5`,`2022-6`)

date <- c("2021-07-01", "2021-08-01", "2021-09-01", "2021-10-01", "2021-11-01", "2021-12-01",
          "2022-01-01", "2022-02-01", "2022-03-01","2022-04-01", "2022-05-01", "2022-06-01")

lists <- list()
for (i in c(1:12)){
  lists[[i]]<- list_mth[[i]] %>% 
    rename(village_code=`KODE KELURAHAN`,
           city_region =`WILAYAH KOTA`,
           vaccinated= `SASARAN`, 
           not_vaccinated =`BELUM VAKSIN`) %>% 
  select(village_code,city_region, not_vaccinated ,vaccinated) %>%
    mutate(date = as.Date(date[i]), 
           .before=1)
}

3.2.3.2 Combine into a single dataframe from a list of dataframes

Afterwhich, we can use Reduce() and rbind from base R to join all dataframes in the lists as one dataframe.

The code chunk below does this:

aspatial_data <- Reduce(rbind, lists)
glimpse(aspatial_data)
Rows: 3,216
Columns: 5
$ date           <date> 2021-07-01, 2021-07-01, 2021-07-01, 2021-07-01, 2021-0…
$ village_code   <chr> NA, "3172051003", "3173041007", "3175041005", "31750310…
$ city_region    <chr> NA, "JAKARTA UTARA", "JAKARTA BARAT", "JAKARTA TIMUR", …
$ not_vaccinated <dbl> 5041111, 13272, 16477, 18849, 5743, 15407, 12503, 11268…
$ vaccinated     <dbl> 7739060, 20393, 25785, 25158, 8683, 22768, 18930, 20267…

We can see from the output that we have the columns that we want in its new name and having 3216 rows.

From the glimpse output, we can see that there are NA values inside the dataframe. If you View the original individual files, you will notice that there will be a row that calculates the total of a respective column and that row contains NA values.

3.2.3.3 Final steps to take

So these are the steps to take,

  1. Remove the NA rows in the dataframe
  2. Filter out the outer islands
    • You will notice from the output of the code chunk below that outer islands is categorised as KAB.ADM.KEP.SERIBU
unique(aspatial_data$city_region)
[1] NA                   "JAKARTA UTARA"      "JAKARTA BARAT"     
[4] "JAKARTA TIMUR"      "JAKARTA SELATAN"    "JAKARTA PUSAT"     
[7] "KAB.ADM.KEP.SERIBU"
  1. Add in vaccination rate column
    • Formula:

The code chunk below does this:

aspatial_data <- aspatial_data %>%
  na.exclude() %>%
  filter(city_region != "KAB.ADM.KEP.SERIBU") %>% 
  mutate(total_population = vaccinated + not_vaccinated, 
         vaccination_rate = as.numeric(vaccinated/total_population)) %>%
  select(date, village_code, vaccination_rate)
  1. Add in village code: 3188888801 and 3188888802

Recall that there are two polygons that held missing values.

setdiff(geoDKI$village_code, aspatial_data$village_code)
[1] "3188888801" "3188888802"

We notice that in the aspatial data, no information were collected for these two sub-districts. Hence, we have to add in these two subdistricts so that both dataframes will match when it is joint later on.

Note: Each month must have one observation, vaccination rate for the added rows will equal 0.

aspatial_data <- rbind(aspatial_data, c("2021-07-01", 3188888801,NA),
                       c("2021-08-01", 3188888801,NA),
                       c("2021-09-01", 3188888801,NA),
                       c("2021-10-01", 3188888801,NA),
                       c("2021-11-01", 3188888801,NA),
                       c("2021-12-01", 3188888801,NA),
                       c("2022-01-01", 3188888801,NA),
                       c("2022-02-01", 3188888801,NA),
                       c("2022-03-01", 3188888801,NA),
                       c("2022-04-01", 3188888801,NA),
                       c("2022-05-01", 3188888801,NA),
                       c("2022-06-01", 3188888801,NA),
                       c("2021-07-01", 3188888802,NA),
                       c("2021-08-01", 3188888802,NA),
                       c("2021-09-01", 3188888802,NA),
                       c("2021-10-01", 3188888802,NA),
                       c("2021-11-01", 3188888802,NA),
                       c("2021-12-01", 3188888802,NA),
                       c("2022-01-01", 3188888802,NA),
                       c("2022-02-01", 3188888802,NA),
                       c("2022-03-01", 3188888802,NA),
                       c("2022-04-01", 3188888802,NA),
                       c("2022-05-01", 3188888802,NA),
                       c("2022-06-01", 3188888802,NA))

Both aspatial_data and geoDKI have the same number of unique village code.
The code chunk below joins them together by their village code, sets vaccination_rate as numeric and excludes the NA values in the joined dataframe.

vaccination <- left_join(aspatial_data, geoDKI, 
                         by = c("village_code")) %>%
  mutate(vaccination_rate = as.numeric(vaccination_rate)) %>%
  na.exclude()
vaccination <- st_as_sf(vaccination)

4 Choropleth Mapping and Analysis

4.1 Visualizing the data

4.1.1 iR Shiny

Something extra to this TakeHome Assignment will be the use of the ShinyApp. This is a short introduction as to how Rshiny works:

  1. Define the UI:
    • You will have an input and in this case, we will use selectInput(“dates”, “Pick a month”, date, selected = “July 2021”,multiple = FALSE).

      • “dates”: This is the variable name to call into the output portion in the server

      • “Pick a month”: This is a text under the user interface to ask the user to pick a choice

      • date: This is the vector of choices that they can pick from.

      • selected = “July 2021” : This sets the choice “July 2021” as the default when starting up the app

      • multiple = FALSE : Prevents the user from picking multiple options and can only choose 1

  2. Define Server:
    • This is where tmap is used and calls upon the variable name defined in UI which is “dates”.
library(shiny)

date <- c("2021-07-01", "2021-08-01", "2021-09-01", "2021-10-01", "2021-11-01", "2021-12-01", "2022-01-01", "2022-02-01", "2022-03-01", "2022-04-01", "2022-05-01", "2022-06-01")
ui <- fluidPage(
  selectInput("dates", "Pick a date",
              date, selected = "2021-07-01",
              multiple = FALSE),

  tmapOutput("my_map")
)

# Define the server
server <- function(input, output) {
  # Render the tmap in the output element
  output$my_map <- renderTmap({
    a <- vaccination |>
      filter(date == input$dates)
    
    tm_shape(a) +
      tm_fill("vaccination_rate",
              n = 6,
              style = "quantile",
              palette = "Blues",
              title = "Vaccination Rate") +
      tm_layout(main.title = paste(input$dates),
                main.title.position = "left",
                legend.height = 0.8, 
                legend.width = 0.8,
                frame = TRUE) +
      tm_borders(alpha = 0.5) +
      tm_grid(alpha =0.2)
  })
}

# Run the app
shinyApp(ui, server)

Refer to visual plot : ShinyApp

Note: There are many ways to create and design the shiny app, under the user interface. One example is using a sliderInput() to let the user their zoom level. This can be found under R shiny documentation.

4.1.2 Tmap Visualization

Considering we have 12 months of Tmap visualization to do, it would be better to create a tmap function.

The code chunk below first filters out vaccination dataframe into their respective months and then inputs the filtered dataframe into the tm_shape().

graphing <- function(x){
  a <- vaccination %>%
    filter(date == x)
  tm_shape(a) +
  tm_fill("vaccination_rate",
          n=10,
          style = "quantile",
          palette = "Blues",
          title = "Vaccination Rate") +
  tm_layout(main.title = paste(x),
            main.title.position = "left",
            legend.height = 0.8, 
            legend.width = 0.8,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_grid(alpha =0.2)
}

We will split the plots into 2 code chunks to reduce the number of graphs in a single output for a clearer view.

tmap_mode("plot")
tmap_arrange(graphing("2021-07-01"),
             graphing("2021-08-01"),
             graphing("2021-09-01"),
             graphing("2021-10-01"),
             graphing("2021-11-01"),
             graphing("2021-12-01"),
             ncol = 2)

tmap_arrange(graphing("2022-01-01"),
             graphing("2022-02-01"),
             graphing("2022-03-01"),
             graphing("2022-04-01"),
             graphing("2022-05-01"),
             graphing("2022-06-01"),
             ncol = 2)

Likewise, this can be done on Rshiny app but this will be covered on a later date when I have attended the Rshiny workshop. :)

4.2 Spatial Patterns observed (200words)

5 Local Gi* Analysis

To bring you back, here is the task assigned in this section:

  • Compute local Gi* values of the monthly vaccination rate,

  • Display the Gi* maps of the monthly vaccination rate. The maps should only display the significant (i.e. p-value < 0.05)

  • With reference to the analysis results, draw statistical conclusions (not more than 250 words)

5.1 Computing Contiguity Spatial Weights

Before we can compute the global spatial autocorrelation statistics, we need to construct the spatial weights of the study area.

Spatial weights: Used to define the neighbourhood relationships between geographical units in the study area.

As we are dealing with 12 different months of vaccination rates, we will have to filter the dataframe to their respective months and calculate their respective weights. Here, we will be using the QUEEN mode and thus the argument: queen =TRUE. The code chunk below shows the function that filters to the selected month and then calculate their weights.

month <- vaccination %>%
  filter(date == "2021-07-01")
wm_q <- month %>%
  na.exclude() %>%
  mutate(nb = st_contiguity(geometry),
         wt = st_inverse_distance(nb, geometry,
                                   scale = 1,
                                   alpha = 1),
         .before = 1)
wm_q
Simple feature collection with 261 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 686587.9 ymin: 9295420 xmax: 718314.5 ymax: 9326648
Projected CRS: DGN95 / UTM zone 48S
# A tibble: 261 × 6
   nb        wt        date       village_code vacci…¹                  geometry
 * <nb>      <list>    <date>     <chr>          <dbl>        <MULTIPOLYGON [m]>
 1 <int [9]> <dbl [9]> 2021-07-01 3172051003     0.606 (((706161.9 9323032, 706…
 2 <int [4]> <dbl [4]> 2021-07-01 3173041007     0.610 (((699268 9320073, 69927…
 3 <int [4]> <dbl [4]> 2021-07-01 3175041005     0.572 (((705677.8 9306675, 705…
 4 <int [7]> <dbl [7]> 2021-07-01 3175031003     0.602 (((706070 9313044, 70607…
 5 <int [4]> <dbl [4]> 2021-07-01 3175101006     0.596 (((711830 9302993, 71181…
 6 <int [8]> <dbl [8]> 2021-07-01 3174031002     0.602 (((700371 9308387, 70038…
 7 <int [4]> <dbl [4]> 2021-07-01 3175051002     0.643 (((704532.3 9301182, 704…
 8 <int [5]> <dbl [5]> 2021-07-01 3175041004     0.577 (((706784.6 9305766, 706…
 9 <int [6]> <dbl [6]> 2021-07-01 3171071002     0.623 (((700406 9314413, 70040…
10 <int [7]> <dbl [7]> 2021-07-01 3175031002     0.580 (((706499.5 9311596, 706…
# … with 251 more rows, and abbreviated variable name ¹​vaccination_rate

5.2 Computing local Moran’s I

Next, we will use local_moran() of the sfdep package to calculate local Moran’s I.

set.seed(1234)
july_LGI <- wm_q %>%
  mutate(local_Gi = local_gstar_perm(vaccination_rate,
                                   nb,
                                   wt,
                                   nsim = 99),
         .before = 1) %>%
  unnest(local_Gi)

5.3 Visualizing p-value of local Gi

tmap_mode("plot")
tm_shape(july_LGI) +
    tm_polygons() +
    tm_shape(july_LGI %>% filter(p_sim <0.05)) +
    tm_fill("gi_star") +
    tm_borders(alpha = 0.5) +
    tm_layout(main.title = paste("significant local Gi", "(", july_LGI$date[1],")"),
              main.title.size = 0.8)

  return(plot)
standardGeneric for "plot" defined from package "base"

function (x, y, ...) 
standardGeneric("plot")
<environment: 0x1420da750>
Methods may be defined for arguments: x, y
Use  showMethods(plot)  for currently available ones.

5.4 Putting it all together

As you follow along, these are the 3 main steps to creating the plots required for this section. As mentioned above, we will combine these 3 sections into a single function to calculate the local Gi for each month. The code chunk below takes in input “mth” to filter vaccination to the respective month and calculate the weights and the local Moran’s I. ### Creating the local MI computation function

lisa <- function(mth){
  set.seed(1234)
  month <- vaccination %>%
    filter(date ==mth)
  wm_q <- month %>%
    na.exclude() %>%
    mutate(nb = st_contiguity(geometry),
           wt = st_inverse_distance(nb, geometry,
                                   scale = 1,
                                   alpha = 1))
  result <- wm_q %>%
    mutate(local_gi = local_gstar_perm(vaccination_rate,
                                   nb,
                                   wt,
                                   nsim = 99),
           .before = 1) %>% 
    unnest(local_gi)
  return(result)
}

Now we will calculate the local Gi for each month and place it into a list for easy referencing for the plots later.

date <- c("2021-07-01", "2021-08-01", "2021-09-01", "2021-10-01", "2021-11-01", "2021-12-01",
          "2022-01-01", "2022-02-01", "2022-03-01","2022-04-01", "2022-05-01", "2022-06-01")
lisa_GI <- list()
for (i in 1:12){
  lisa_GI[[i]] <- lisa(date[i])
}

Now we have a list of dataframes with local Moran I computed. Note that lisa_LMI[[1]] refers to July 2021, in ascending order.

5.4.1 Creating the Tmap function

graph_lisa <- function(x){
  HCSA_sig <- x %>%
    filter(p_sim <0.05)
  HCSA_plots <- tm_shape(x) +
    tm_polygons() +
    tm_borders(alpha = 0.5) +
    tm_shape(HCSA_sig) +
      tm_fill("gi_star",
              palette = "RdYlGn",
              midpoint = 0) +
      tm_borders(alpha = 0.4) +
    tm_layout(main.title = paste("significant local Gi", "(",x$date[1],")"),
              main.title.size = 0.8)
  return(HCSA_plots)
}
tmap_mode("plot")
tmap_arrange(graph_lisa(lisa_GI[[1]]),
             graph_lisa(lisa_GI[[2]]),
             graph_lisa(lisa_GI[[3]]),
             graph_lisa(lisa_GI[[4]]),
             graph_lisa(lisa_GI[[5]]),
             graph_lisa(lisa_GI[[6]]),
             ncol =2)

tmap_arrange(graph_lisa(lisa_GI[[7]]),
             graph_lisa(lisa_GI[[8]]),
             graph_lisa(lisa_GI[[9]]),
             graph_lisa(lisa_GI[[10]]),
             graph_lisa(lisa_GI[[11]]),
             graph_lisa(lisa_GI[[12]]),
             ncol =2)

5.5 Statistical Conclusion (Not more than 250words) :

6 Emerging Hot Spot Analysis (EHSA)

6.1 Creating a Time Series Cube

The code chunk below creates a spatio-temporal cube by using spacetime() from the sfdep package.

vaccination_rate_st<- as_spacetime(vaccination,
                                .loc_col = "village_code",
                                .time_col = "date")
vaccination_rate_nb <- vaccination_rate_st %>%
  activate("geometry") %>%
  mutate(
    nb = include_self(st_contiguity(geometry)),
    wt = st_inverse_distance(nb, geometry,
                                  scale = 1,
                                  alpha = 1),
    .before = 1) %>%
  set_nbs("nb") %>%
  set_wts("wt")
gi_stars <- vaccination_rate_nb %>%
  group_by(date) %>%
  mutate(vaccination_rate = as.numeric(vaccination_rate),
         gi_star = local_gstar_perm(vaccination_rate, 
                                    nb, 
                                    wt, 
                                    nsim = 99)) %>%
  tidyr::unnest(gi_star)

6.2 Mann-Kendall Test

Subdistricts Chosen:

  1. 3174101005
  2. 3173031006
  3. 3174091001
cbg_1 <- gi_stars %>%
  ungroup() %>%
  filter(village_code == "3174101005") %>%
  select(date, village_code, gi_star)
ggplot(data = cbg_1, 
       aes(x = date, 
           y = gi_star)) +
  geom_line() +
  theme_light()

cbg_1 %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>% 
  tidyr::unnest_wider(mk)
# A tibble: 1 × 5
    tau      sl     S     D  varS
  <dbl>   <dbl> <dbl> <dbl> <dbl>
1 0.727 0.00127    48  66.0  213.
cbg_2 <- gi_stars %>%
  ungroup() %>%
  filter(village_code == "3173031006") %>%
  select(date, village_code, gi_star)
ggplot(data = cbg_2, 
       aes(x = date, 
           y = gi_star)) +
  geom_line() +
  theme_light()

cbg_2 %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>% 
  tidyr::unnest_wider(mk)
# A tibble: 1 × 5
    tau    sl     S     D  varS
  <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.121 0.631     8  66.0  213.
cbg_3 <- gi_stars %>%
  ungroup() %>%
  filter(village_code == "3174091001") %>%
  select(date, village_code, gi_star)
ggplot(data = cbg_3, 
       aes(x = date, 
           y = gi_star)) +
  geom_line() +
  theme_light()

cbg_3 %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>% 
  tidyr::unnest_wider(mk)
# A tibble: 1 × 5
    tau      sl     S     D  varS
  <dbl>   <dbl> <dbl> <dbl> <dbl>
1 0.697 0.00203    46  66.0  213.
ehsa <- gi_stars %>%
  group_by(date) %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>%
  tidyr::unnest_wider(mk)
emerging <- ehsa %>% 
  arrange(sl, abs(tau)) %>% 
  slice(1:5)

6.3

ehsa <- emerging_hotspot_analysis(
  x = vaccination_rate_st,
  .var = "vaccination_rate",
  k = 1,
  nsim = 99
)
ggplot(data = ehsa,
       aes(x = classification)) +
  geom_bar()

vaccination_ehsa <- left_join(geoDKI, ehsa, by = c("village_code" = "location"))
ehsa_sig <- vaccination_ehsa  %>%
  filter(p_value < 0.05)
tmap_mode("view")
tm_shape(vaccination_ehsa) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(ehsa_sig) +
  tm_fill("classification") + 
  tm_borders(alpha = 0.4)